{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Binary classification with the crabs data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The package is designed to handle models of the following general form\n",
    "$$\\begin{aligned}\n",
    "\\mathbf{y} \\ |\\ \\mathbf{f}, \\theta &\\sim  \\prod_{i=1}^n p(y_i \\ | \\ f_i,\\theta), \\\\\n",
    "    \\mathbf{f} \\ | \\ \\theta &\\sim \\mathcal{GP}\\left(m_{\\theta}(\\mathbf{x}), k_{\\theta}(\\mathbf{x}, \\mathbf{x}')\\right),\\\\\n",
    "      \\theta &\\sim p(\\theta), \n",
    "\\end{aligned}$$\n",
    "where $\\mathbf{y}=(y_1,y_2,\\ldots,y_n) \\in \\mathcal{Y}$ and $\\mathbf{x} \\in \\mathcal{X}$ are the observations and covariates, respectively, and $f_i:=f(\\mathbf{x}_i)$ is the latent function which we model with a Gaussian process prior. We assume that the responses $\\mathbf{y}$ are independent and identically distributed and as a result the likelihood $p(\\mathbf{y} \\ | \\ \\mathbf{f}, \\theta)$, can be factorized over the observations. \n",
    "\n",
    "In the case where the observations are Gaussian distributed, the marginal likelihood and predictive distribution can be derived analytically. See the  [Regression notebook](https://github.com/STOR-i/GaussianProcesses.jl/blob/master/notebooks/Regression.ipynb) for an illustration."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example we show how the GP **Monte Carlo** function can be used for **supervised learning classification**. We use the Crab dataset from the R package MASS. In this dataset we are interested in predicting whether a crab is of colour form blue or orange. Our aim is to perform a Bayesian analysis and calculate the posterior distribution of the latent GP function $\\mathbf{f}$ and model parameters $\\theta$ from the training data $\\{\\mathbf{X}, \\mathbf{y}\\}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "using GaussianProcesses, RDatasets\n",
    "import Distributions:Normal\n",
    "\n",
    "srand(113355)\n",
    "\n",
    "crabs = dataset(\"MASS\",\"crabs\");              # load the data \n",
    "crabs = crabs[shuffle(1:size(crabs)[1]), :];  # shuffle the data\n",
    "\n",
    "train = crabs[1:div(end,2),:];\n",
    "\n",
    "y = Array{Bool}(size(train)[1]);           # response\n",
    "y[train[:Sp].==\"B\"]=0;                      # convert characters to booleans\n",
    "y[train[:Sp].==\"O\"]=1;\n",
    "\n",
    "X = convert(Array,train[:,4:end]);          # predictors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We assume a zero mean GP with a Matern 3/2 kernel. We use the automatic relevance determination (ARD) kernel to allow each dimension of the predictor variables to have a different length scale. As this is binary classifcation, we use the Bernoulli likelihood, \n",
    "\n",
    "$$\n",
    "y_i \\sim Bernoulli(\\Phi(f_i))\n",
    "$$\n",
    "where $\\Phi: \\mathbb{R} \\rightarrow [0,1]$ is the cumulative distribution function of a standard Gaussian and acts as a squash function that maps the GP function to the interval [0,1], giving the probability that $y_i=1$. \n",
    "\n",
    "**Note** that `BernLik` requires the observations to be of type `Bool` and unlike some likelihood functions (e.g. student-t) does not contain any parameters to be set at initialisation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Select mean, kernel and likelihood function\n",
    "mZero = MeanZero();                # Zero mean function\n",
    "kern = Matern(3/2,zeros(5),0.0);   # Matern 3/2 ARD kernel (note that hyperparameters are on the log scale)\n",
    "lik = BernLik();                   # Bernoulli likelihood for binary data {0,1}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We fit the GP using the general `GP` function. This function is a shorthand for the `GPMC` function which is used to generate **Monte Carlo approximations** of the latent function when the **likelihood is non-Gaussian**. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GP Monte Carlo object:\n",
       "  Dim = 5\n",
       "  Number of observations = 100\n",
       "  Mean function:\n",
       "    Type: GaussianProcesses.MeanZero, Params: Float64[]\n",
       "  Kernel:\n",
       "    Type: GaussianProcesses.Mat32Ard, Params: [-0.0, -0.0, -0.0, -0.0, -0.0, 0.0]\n",
       "  Likelihood:\n",
       "    Type: GaussianProcesses.BernLik, Params: Any[]\n",
       "  Input observations = \n",
       "[16.2 11.2 … 11.6 18.5; 13.3 10.0 … 9.1 14.6; … ; 41.7 26.9 … 28.4 42.0; 15.4 9.4 … 10.4 16.6]\n",
       "  Output observations = Bool[false, false, false, false, true, true, false, true, true, true  …  false, true, false, false, false, true, false, false, false, true]\n",
       "  Log-posterior = -161.209"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gp = GP(X',y,mZero,kern,lik)      # Fit the Gaussian process model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We assign `Normal` priors from the `Distributions` package to each of the Matern kernel parameters. If the mean and likelihood function also contained parameters, then we could set these priors in the same using `gp.m` and `gp.lik` in place of `gp.k`, respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6-element Array{Distributions.Normal{Float64},1}:\n",
       " Distributions.Normal{Float64}(μ=0.0, σ=2.0)\n",
       " Distributions.Normal{Float64}(μ=0.0, σ=2.0)\n",
       " Distributions.Normal{Float64}(μ=0.0, σ=2.0)\n",
       " Distributions.Normal{Float64}(μ=0.0, σ=2.0)\n",
       " Distributions.Normal{Float64}(μ=0.0, σ=2.0)\n",
       " Distributions.Normal{Float64}(μ=0.0, σ=2.0)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set_priors!(gp.k,[Normal(0.0,2.0) for i in 1:6])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Samples of the latent function $f | X,y,\\theta$ are drawn using MCMC sampling. The MCMC routine uses the `Klara` package. By default, the `mcmc` function uses the Hamiltonian Monte Carlo algorithm. Alternative samplers, such as `Klara.MALA()` and `Klara.NUTS()`, can also be used (see the [Klara](https://github.com/JuliaStats/Klara.jl) documentation for more details). Unless defined, the default settings for the MCMC sampler are: 1,000 iterations with no burn-in phase or thinning of the Markov chain. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BasicMCJob:\n",
      "  Variable [1]: p (BasicContMuvParameter)\n",
      "  GenericModel: 1 variables, 0 dependencies (directed graph)\n",
      "  MALA sampler: drift step = 0.1\n",
      "  VanillaMCTuner: period = 100, verbose = false\n",
      "  BasicMCRange: number of steps = 4996, burnin = 1000, thinning = 5"
     ]
    }
   ],
   "source": [
    "samples = mcmc(gp;sampler=Klara.MALA(0.1),nIter=5000,burnin=1000,thin=5);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We test the predictive accuracy of the fitted model against a hold-out dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "test = crabs[div(end,2)+1:end,:];          # select test data\n",
    "\n",
    "yTest = Array{Bool}(size(test)[1]);        # test response data\n",
    "yTest[test[:Sp].==\"B\"]=0;                  # convert characters to booleans\n",
    "yTest[test[:Sp].==\"O\"]=1;\n",
    "\n",
    "xTest = convert(Array,test[:,4:end]);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the posterior samples $\\{f^{(i)},\\theta^{(i)}\\}_{i=1}^N$ from $p(f,\\theta \\ | \\ X,y)$ we can make predictions $\\hat{y}$ using the `predict_y` function to sample predictions conditional on the MCMC samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ymean = Array{Float64}(size(samples,2),size(xTest,1));\n",
    "\n",
    "for i in 1:size(samples,2)\n",
    "    set_params!(gp,samples[:,i])\n",
    "    update_target!(gp)\n",
    "    ymean[i,:] = predict_y(gp,xTest')[1]\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For each of the posterior samples we plot the predicted observation $\\hat{y}$ (given as lines) and overlay the true observations from the held-out data (circles)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"\" />"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using Plots\n",
    "gr()\n",
    "\n",
    "plot(ymean',leg=false,html_output_format=:png)\n",
    "scatter!(yTest)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.6.0",
   "language": "julia",
   "name": "julia-0.6"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
